Loading necessary libraries
library(gprofiler2)
library(tidyverse)
library(readxl)
Check your working directory
- Paths provided within this .rmd assume you are working within the
Hackathon19_visualise_ontology project directory. Provided you have opened Hackathon19_visualise_ontology as a project this should already be the case.
- You can check by doing the following:
## [1] "/home/rreynolds/misc_projects/Hackathon19_visualise_ontology"
Running gProfiler2
- Input genes are sourced from the latest PD GWAS.
# import PD GWAS gene list
geneList <-
readxl::read_excel(path = "./data/TableS2_Detailed_summary_statistics_nominated_risk_variants.xlsx") %>%
# Extract "Nearest Gene" column as vector
.[["Nearest Gene"]] %>%
# Remove NAs
na.omit() %>%
# Convert to character vector
as.character() %>%
# Keep only unique genes
unique()
# running gprofiler
gprofilerOutput <- gprofiler2::gost(geneList,
organism = "hsapiens", # set the organism
correction_method = c("gSCS"), # select a correction method (gSCS reccomended in gost documentation)
domain_scope = c("annotated"), # select whether to restrict the background set
sources = c("GO"), # databases would you like to search for pathways & processes (others: "KEGG","REAC","WP")
significant = TRUE, # when TRUE displays only significant results (p<0.05)
evcodes = FALSE, # when TRUE displays the IDs of the genes calling each pathway
ordered_query = FALSE) # when TRUE the input list is ordered in some meaningful way
- The background list (
domain_scope argument) is an important choice, as it should represent the population of genes your list was sampled from. E.g. If you were to perform a differential gene expression experiment in brain samples from control and PD, your background list would be all the expressed genes you tested.
What do the results mean?
gprofilerOutput$result %>%
DT::datatable(rownames = FALSE,
options = list(scrollX = TRUE),
class = 'white-space: nowrap')
- gprofiler outputs a number of columns, a few of which are not entirely self-explanatory, including:
- precision: this is the proportion of genes in the input gene list that can be assigned this term i.e. overlap.size/query.size
- recall: this is the proportion of genes in the input list that overlap with the term i.e. overlap.size/term.size
- parents: GO terms belong to families of terms. This column will provide the parent terms to the GO term found enriched.
Visualising the results as a barplot
source("./R/gen_gprofiler_barplot.R")
gen_gprofiler_barplot(DF = gprofilerOutput$result,
num_categories = 10,
plot_title = "")

Visualising the results with gprofiler’s in-built visualisation
gprofiler2::gostplot(gprofilerOutput)
Visualising as a network?
- Nodes would represent GO terms and edges the genes shared between these GO terms
- Potential packages:
Session info
- It is considered good practice to always your session info in the html output of an R markdown.
- Session info will depend on the RStudio you are using, your operating system and the packages you load. Thus, when you knit the final product of your code, it may not look like it does in this version of the html output.
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] plyr_1.8.4 data.table_1.12.2 gridExtra_2.3
## [4] scales_1.0.0 readxl_1.3.1 forcats_0.4.0
## [7] stringr_1.4.0 dplyr_0.8.3 purrr_0.3.3
## [10] readr_1.3.1 tidyr_1.0.0 tibble_2.1.3
## [13] ggplot2_3.2.1 tidyverse_1.2.1 gprofiler2_0.1.7
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.2 lubridate_1.7.4 lattice_0.20-38
## [4] assertthat_0.2.1 zeallot_0.1.0 digest_0.6.21
## [7] mime_0.7 R6_2.4.0 cellranger_1.1.0
## [10] backports_1.1.4 evaluate_0.14 httr_1.4.1
## [13] pillar_1.4.2 rlang_0.4.0 lazyeval_0.2.2
## [16] rstudioapi_0.10 DT_0.9 rmarkdown_1.18
## [19] labeling_0.3 htmlwidgets_1.3 RCurl_1.95-4.12
## [22] munsell_0.5.0 shiny_1.3.2 broom_0.5.2
## [25] compiler_3.6.1 httpuv_1.5.2 modelr_0.1.5
## [28] xfun_0.11 pkgconfig_2.0.3 htmltools_0.3.6
## [31] tidyselect_0.2.5 viridisLite_0.3.0 crayon_1.3.4
## [34] withr_2.1.2 later_0.8.0 bitops_1.0-6
## [37] grid_3.6.1 nlme_3.1-141 jsonlite_1.6
## [40] xtable_1.8-4 gtable_0.3.0 lifecycle_0.1.0
## [43] magrittr_1.5 cli_1.1.0 stringi_1.4.3
## [46] promises_1.0.1 xml2_1.2.2 ellipsis_0.3.0
## [49] generics_0.0.2 vctrs_0.2.0 tools_3.6.1
## [52] glue_1.3.1 hms_0.5.1 crosstalk_1.0.0
## [55] yaml_2.2.0 colorspace_1.4-1 rvest_0.3.4
## [58] plotly_4.9.1 knitr_1.25 haven_2.1.1